Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 39759 |
| Missing cells | 15903 |
| Missing cells (%) | 1.8% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 6.7 MiB |
| Average record size in memory | 176.0 B |
Variable types
| NUM | 17 |
|---|---|
| BOOL | 3 |
| CAT | 2 |
Reproduction
| Analysis started | 2020-06-07 11:36:14.778501 |
|---|---|
| Analysis finished | 2020-06-07 11:37:12.522602 |
| Duration | 57.74 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
QUARTER_OF_YEAR is highly correlated with MONTH | High correlation |
MONTH is highly correlated with QUARTER_OF_YEAR | High correlation |
MULTIPLE_OFFENSE has 15903 (40.0%) missing values | Missing |
X_10 is highly skewed (γ1 = 30.92348051) | Skewed |
X_12 is highly skewed (γ1 = 26.64404103) | Skewed |
INCIDENT_ID has unique values | Unique |
X_4 has 5588 (14.1%) zeros | Zeros |
X_5 has 7908 (19.9%) zeros | Zeros |
X_7 has 5794 (14.6%) zeros | Zeros |
X_8 has 14634 (36.8%) zeros | Zeros |
X_11 has 4268 (10.7%) zeros | Zeros |
X_12 has 8517 (21.4%) zeros | Zeros |
X_14 has 458 (1.2%) zeros | Zeros |
X_15 has 1680 (4.2%) zeros | Zeros |
df_index
Real number (ℝ≥0)
| Distinct count | 23856 |
|---|---|
| Unique (%) | 60.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10336.960009054554 |
|---|---|
| Minimum | 0 |
| Maximum | 23855 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 993.9 |
| Q1 | 4969.5 |
| median | 9939 |
| Q3 | 14909 |
| 95-th percentile | 21867.1 |
| Maximum | 23855 |
| Range | 23855 |
| Interquartile range (IQR) | 9939.5 |
Descriptive statistics
| Standard deviation | 6378.244484 |
|---|---|
| Coefficient of variation (CV) | 0.617032907 |
| Kurtosis | -0.893746517 |
| Mean | 10336.96001 |
| Median Absolute Deviation (MAD) | 4970 |
| Skewness | 0.2791311245 |
| Sum | 410987193 |
| Variance | 40682002.69 |
| Value | Count | Frequency (%) | |
| 2047 | 2 | < 0.1% | |
| 5646 | 2 | < 0.1% | |
| 9768 | 2 | < 0.1% | |
| 11817 | 2 | < 0.1% | |
| 13866 | 2 | < 0.1% | |
| 1580 | 2 | < 0.1% | |
| 3629 | 2 | < 0.1% | |
| 5678 | 2 | < 0.1% | |
| 7727 | 2 | < 0.1% | |
| 9800 | 2 | < 0.1% | |
| Other values (23846) | 39739 | 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 23855 | 1 | < 0.1% | |
| 23854 | 1 | < 0.1% | |
| 23853 | 1 | < 0.1% | |
| 23852 | 1 | < 0.1% | |
| 23851 | 1 | < 0.1% |
| Distinct count | 39759 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| CR_77927 | 1 |
|---|---|
| CR_152304 | 1 |
| CR_10505 | 1 |
| CR_128864 | 1 |
| CR_28045 | 1 |
| Other values (39754) |
| Value | Count | Frequency (%) | |
| CR_77927 | 1 | < 0.1% | |
| CR_152304 | 1 | < 0.1% | |
| CR_10505 | 1 | < 0.1% | |
| CR_128864 | 1 | < 0.1% | |
| CR_28045 | 1 | < 0.1% | |
| CR_2602 | 1 | < 0.1% | |
| CR_149689 | 1 | < 0.1% | |
| CR_112250 | 1 | < 0.1% | |
| CR_25799 | 1 | < 0.1% | |
| CR_152939 | 1 | < 0.1% | |
| Other values (39749) | 39749 | > 99.9% |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 8.444201313 |
| Min length | 4 |
X_2
Real number (ℝ≥0)
| Distinct count | 52 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.763776754948566 |
|---|---|
| Minimum | 0 |
| Maximum | 52 |
| Zeros | 40 |
| Zeros (%) | 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 7 |
| median | 24 |
| Q3 | 36 |
| 95-th percentile | 48 |
| Maximum | 52 |
| Range | 52 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 15.23552157 |
|---|---|
| Coefficient of variation (CV) | 0.6152341673 |
| Kurtosis | -1.307501292 |
| Mean | 24.76377675 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.09338549112 |
| Sum | 984583 |
| Variance | 232.1211175 |
| Value | Count | Frequency (%) | |
| 4 | 6724 | 16.9% | |
| 36 | 3657 | 9.2% | |
| 33 | 3573 | 9.0% | |
| 24 | 2257 | 5.7% | |
| 21 | 2088 | 5.3% | |
| 37 | 1606 | 4.0% | |
| 45 | 1545 | 3.9% | |
| 49 | 1486 | 3.7% | |
| 3 | 1307 | 3.3% | |
| 22 | 1091 | 2.7% | |
| Other values (42) | 14425 | 36.3% |
| Value | Count | Frequency (%) | |
| 0 | 40 | 0.1% | |
| 1 | 33 | 0.1% | |
| 2 | 194 | 0.5% | |
| 3 | 1307 | 3.3% | |
| 4 | 6724 | 16.9% |
| Value | Count | Frequency (%) | |
| 52 | 25 | 0.1% | |
| 51 | 162 | 0.4% | |
| 50 | 279 | 0.7% | |
| 49 | 1486 | 3.7% | |
| 48 | 98 | 0.2% |
| Distinct count | 10 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.279735405820066 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 5588 |
| Zeros (%) | 14.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 10 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.956637769 |
|---|---|
| Coefficient of variation (CV) | 0.6908459259 |
| Kurtosis | -1.018315231 |
| Mean | 4.279735406 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1871304045 |
| Sum | 170158 |
| Variance | 8.741706897 |
| Value | Count | Frequency (%) | |
| 6 | 9078 | 22.8% | |
| 2 | 7883 | 19.8% | |
| 0 | 5588 | 14.1% | |
| 7 | 4781 | 12.0% | |
| 4 | 3369 | 8.5% | |
| 3 | 3160 | 7.9% | |
| 9 | 2320 | 5.8% | |
| 10 | 2113 | 5.3% | |
| 1 | 1461 | 3.7% | |
| 5 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 5588 | 14.1% | |
| 1 | 1461 | 3.7% | |
| 2 | 7883 | 19.8% | |
| 3 | 3160 | 7.9% | |
| 4 | 3369 | 8.5% |
| Value | Count | Frequency (%) | |
| 10 | 2113 | 5.3% | |
| 9 | 2320 | 5.8% | |
| 7 | 4781 | 12.0% | |
| 6 | 9078 | 22.8% | |
| 5 | 6 | < 0.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4527528358359114 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 7908 |
| Zeros (%) | 19.9% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.96318419 |
|---|---|
| Coefficient of variation (CV) | 0.8004003343 |
| Kurtosis | -1.556820375 |
| Mean | 2.452752836 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1743576897 |
| Sum | 97519 |
| Variance | 3.854092163 |
| Value | Count | Frequency (%) | |
| 5 | 12238 | 30.8% | |
| 1 | 11252 | 28.3% | |
| 3 | 8355 | 21.0% | |
| 0 | 7908 | 19.9% | |
| 2 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 7908 | 19.9% | |
| 1 | 11252 | 28.3% | |
| 2 | 6 | < 0.1% | |
| 3 | 8355 | 21.0% | |
| 5 | 12238 | 30.8% |
| Value | Count | Frequency (%) | |
| 5 | 12238 | 30.8% | |
| 3 | 8355 | 21.0% | |
| 2 | 6 | < 0.1% | |
| 1 | 11252 | 28.3% | |
| 0 | 7908 | 19.9% |
X_6
Real number (ℝ≥0)
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.126461933147212 |
|---|---|
| Minimum | 1 |
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 8 |
| 95-th percentile | 15 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 4.463585046 |
|---|---|
| Coefficient of variation (CV) | 0.7285746806 |
| Kurtosis | 0.06079304921 |
| Mean | 6.126461933 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.9704193921 |
| Sum | 243582 |
| Variance | 19.92359146 |
| Value | Count | Frequency (%) | |
| 1 | 5794 | 14.6% | |
| 5 | 4476 | 11.3% | |
| 6 | 4390 | 11.0% | |
| 4 | 3869 | 9.7% | |
| 2 | 3863 | 9.7% | |
| 15 | 3822 | 9.6% | |
| 7 | 3728 | 9.4% | |
| 3 | 2909 | 7.3% | |
| 8 | 2356 | 5.9% | |
| 9 | 2098 | 5.3% | |
| Other values (9) | 2454 | 6.2% |
| Value | Count | Frequency (%) | |
| 1 | 5794 | 14.6% | |
| 2 | 3863 | 9.7% | |
| 3 | 2909 | 7.3% | |
| 4 | 3869 | 9.7% | |
| 5 | 4476 | 11.3% |
| Value | Count | Frequency (%) | |
| 19 | 5 | < 0.1% | |
| 18 | 264 | 0.7% | |
| 17 | 183 | 0.5% | |
| 16 | 1026 | 2.6% | |
| 15 | 3822 | 9.6% |
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.870947458437083 |
|---|---|
| Minimum | 0 |
| Maximum | 18 |
| Zeros | 5794 |
| Zeros (%) | 14.6% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 7 |
| 95-th percentile | 12 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.870959307 |
|---|---|
| Coefficient of variation (CV) | 0.7947035644 |
| Kurtosis | 0.5203116861 |
| Mean | 4.870947458 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.7988064995 |
| Sum | 193664 |
| Variance | 14.98432596 |
| Value | Count | Frequency (%) | |
| 0 | 5794 | 14.6% | |
| 6 | 4476 | 11.3% | |
| 4 | 4390 | 11.0% | |
| 2 | 3869 | 9.7% | |
| 7 | 3863 | 9.7% | |
| 10 | 3822 | 9.6% | |
| 1 | 3728 | 9.4% | |
| 5 | 2909 | 7.3% | |
| 3 | 2356 | 5.9% | |
| 8 | 2098 | 5.3% | |
| Other values (9) | 2454 | 6.2% |
| Value | Count | Frequency (%) | |
| 0 | 5794 | 14.6% | |
| 1 | 3728 | 9.4% | |
| 2 | 3869 | 9.7% | |
| 3 | 2356 | 5.9% | |
| 4 | 4390 | 11.0% |
| Value | Count | Frequency (%) | |
| 18 | 240 | 0.6% | |
| 17 | 327 | 0.8% | |
| 16 | 339 | 0.9% | |
| 15 | 39 | 0.1% | |
| 14 | 31 | 0.1% |
| Distinct count | 27 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9781684650016349 |
|---|---|
| Minimum | 0 |
| Maximum | 99 |
| Zeros | 14634 |
| Zeros (%) | 36.8% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 99 |
| Range | 99 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.46042113 |
|---|---|
| Coefficient of variation (CV) | 1.49301596 |
| Kurtosis | 652.7401544 |
| Mean | 0.978168465 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 14.4255057 |
| Sum | 38891 |
| Variance | 2.132829876 |
| Value | Count | Frequency (%) | |
| 1 | 18329 | 46.1% | |
| 0 | 14634 | 36.8% | |
| 2 | 3772 | 9.5% | |
| 3 | 1592 | 4.0% | |
| 4 | 673 | 1.7% | |
| 5 | 350 | 0.9% | |
| 6 | 152 | 0.4% | |
| 7 | 61 | 0.2% | |
| 8 | 54 | 0.1% | |
| 10 | 41 | 0.1% | |
| Other values (17) | 101 | 0.3% |
| Value | Count | Frequency (%) | |
| 0 | 14634 | 36.8% | |
| 1 | 18329 | 46.1% | |
| 2 | 3772 | 9.5% | |
| 3 | 1592 | 4.0% | |
| 4 | 673 | 1.7% |
| Value | Count | Frequency (%) | |
| 99 | 1 | < 0.1% | |
| 50 | 3 | < 0.1% | |
| 40 | 1 | < 0.1% | |
| 30 | 2 | < 0.1% | |
| 29 | 1 | < 0.1% |
X_9
Real number (ℝ≥0)
| Distinct count | 7 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.917980834528032 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 200 |
| Zeros (%) | 0.5% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.367461734 |
|---|---|
| Coefficient of variation (CV) | 0.2780534899 |
| Kurtosis | 1.252125374 |
| Mean | 4.917980835 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -1.517828575 |
| Sum | 195534 |
| Variance | 1.869951595 |
| Value | Count | Frequency (%) | |
| 5 | 17610 | 44.3% | |
| 6 | 15781 | 39.7% | |
| 2 | 5091 | 12.8% | |
| 3 | 762 | 1.9% | |
| 1 | 310 | 0.8% | |
| 0 | 200 | 0.5% | |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 200 | 0.5% | |
| 1 | 310 | 0.8% | |
| 2 | 5091 | 12.8% | |
| 3 | 762 | 1.9% | |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6 | 15781 | 39.7% | |
| 5 | 17610 | 44.3% | |
| 4 | 5 | < 0.1% | |
| 3 | 762 | 1.9% | |
| 2 | 5091 | 12.8% |
| Distinct count | 26 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.243366281848135 |
|---|---|
| Minimum | 1 |
| Maximum | 90 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 90 |
| Range | 89 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.017419435 |
|---|---|
| Coefficient of variation (CV) | 0.8182781294 |
| Kurtosis | 2000.81086 |
| Mean | 1.243366282 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 30.92348051 |
| Sum | 49435 |
| Variance | 1.035142307 |
| Value | Count | Frequency (%) | |
| 1 | 33618 | 84.6% | |
| 2 | 4532 | 11.4% | |
| 3 | 924 | 2.3% | |
| 4 | 364 | 0.9% | |
| 5 | 114 | 0.3% | |
| 6 | 92 | 0.2% | |
| 8 | 25 | 0.1% | |
| 10 | 25 | 0.1% | |
| 7 | 23 | 0.1% | |
| 9 | 11 | < 0.1% | |
| Other values (16) | 31 | 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 33618 | 84.6% | |
| 2 | 4532 | 11.4% | |
| 3 | 924 | 2.3% | |
| 4 | 364 | 0.9% | |
| 5 | 114 | 0.3% |
| Value | Count | Frequency (%) | |
| 90 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 50 | 1 | < 0.1% | |
| 40 | 2 | < 0.1% | |
| 30 | 1 | < 0.1% |
| Distinct count | 150 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 206.95434995849996 |
|---|---|
| Minimum | 0 |
| Maximum | 332 |
| Zeros | 4268 |
| Zeros (%) | 10.7% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 174 |
| median | 249 |
| Q3 | 249 |
| 95-th percentile | 316 |
| Maximum | 332 |
| Range | 332 |
| Interquartile range (IQR) | 75 |
Descriptive statistics
| Standard deviation | 93.0619573 |
|---|---|
| Coefficient of variation (CV) | 0.4496738403 |
| Kurtosis | 0.192539772 |
| Mean | 206.95435 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | -0.9031502716 |
| Sum | 8228298 |
| Variance | 8660.527897 |
| Value | Count | Frequency (%) | |
| 174 | 12100 | 30.4% | |
| 249 | 11552 | 29.1% | |
| 316 | 7577 | 19.1% | |
| 0 | 4268 | 10.7% | |
| 303 | 707 | 1.8% | |
| 127 | 519 | 1.3% | |
| 179 | 357 | 0.9% | |
| 74 | 334 | 0.8% | |
| 102 | 208 | 0.5% | |
| 263 | 176 | 0.4% | |
| Other values (140) | 1961 | 4.9% |
| Value | Count | Frequency (%) | |
| 0 | 4268 | 10.7% | |
| 1 | 3 | < 0.1% | |
| 6 | 3 | < 0.1% | |
| 11 | 7 | < 0.1% | |
| 12 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 332 | 4 | < 0.1% | |
| 330 | 39 | 0.1% | |
| 329 | 31 | 0.1% | |
| 328 | 120 | 0.3% | |
| 327 | 2 | < 0.1% |
| Distinct count | 24 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9735405820065897 |
|---|---|
| Minimum | 0.0 |
| Maximum | 90.0 |
| Zeros | 8517 |
| Zeros (%) | 21.4% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 90 |
| Range | 90 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.056816506 |
|---|---|
| Coefficient of variation (CV) | 1.085539242 |
| Kurtosis | 1723.772266 |
| Mean | 0.973540582 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 26.64404103 |
| Sum | 38707 |
| Variance | 1.116861127 |
| Value | Count | Frequency (%) | |
| 1 | 26513 | 66.7% | |
| 0 | 8517 | 21.4% | |
| 2 | 3420 | 8.6% | |
| 3 | 797 | 2.0% | |
| 4 | 276 | 0.7% | |
| 5 | 101 | 0.3% | |
| 6 | 59 | 0.1% | |
| 8 | 18 | < 0.1% | |
| 7 | 14 | < 0.1% | |
| 10 | 11 | < 0.1% | |
| Other values (14) | 33 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 8517 | 21.4% | |
| 1 | 26513 | 66.7% | |
| 2 | 3420 | 8.6% | |
| 3 | 797 | 2.0% | |
| 4 | 276 | 0.7% |
| Value | Count | Frequency (%) | |
| 90 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 50 | 1 | < 0.1% | |
| 40 | 2 | < 0.1% | |
| 30 | 1 | < 0.1% |
X_13
Real number (ℝ≥0)
| Distinct count | 68 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 85.21886868382002 |
|---|---|
| Minimum | 0 |
| Maximum | 117 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 72 |
| median | 98 |
| Q3 | 103 |
| 95-th percentile | 112 |
| Maximum | 117 |
| Range | 117 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 27.55532481 |
|---|---|
| Coefficient of variation (CV) | 0.3233476956 |
| Kurtosis | 1.1341156 |
| Mean | 85.21886868 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | -1.398063774 |
| Sum | 3388217 |
| Variance | 759.2959255 |
| Value | Count | Frequency (%) | |
| 103 | 11775 | 29.6% | |
| 72 | 7612 | 19.1% | |
| 92 | 5353 | 13.5% | |
| 112 | 3468 | 8.7% | |
| 98 | 2307 | 5.8% | |
| 18 | 1399 | 3.5% | |
| 24 | 886 | 2.2% | |
| 109 | 848 | 2.1% | |
| 12 | 702 | 1.8% | |
| 59 | 560 | 1.4% | |
| Other values (58) | 4849 | 12.2% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 8 | < 0.1% | |
| 2 | 382 | 1.0% | |
| 7 | 2 | < 0.1% | |
| 8 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 117 | 1 | < 0.1% | |
| 116 | 466 | 1.2% | |
| 115 | 31 | 0.1% | |
| 114 | 20 | 0.1% | |
| 113 | 367 | 0.9% |
| Distinct count | 69 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 72.49201438667974 |
|---|---|
| Minimum | 0 |
| Maximum | 142 |
| Zeros | 458 |
| Zeros (%) | 1.2% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 29 |
| median | 62 |
| Q3 | 107 |
| 95-th percentile | 142 |
| Maximum | 142 |
| Range | 142 |
| Interquartile range (IQR) | 78 |
Descriptive statistics
| Standard deviation | 43.35376456 |
|---|---|
| Coefficient of variation (CV) | 0.5980488323 |
| Kurtosis | -1.324487842 |
| Mean | 72.49201439 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 0.2532434153 |
| Sum | 2882210 |
| Variance | 1879.548901 |
| Value | Count | Frequency (%) | |
| 29 | 13659 | 34.4% | |
| 93 | 5140 | 12.9% | |
| 142 | 4557 | 11.5% | |
| 62 | 4070 | 10.2% | |
| 80 | 2529 | 6.4% | |
| 130 | 1976 | 5.0% | |
| 107 | 1234 | 3.1% | |
| 14 | 1158 | 2.9% | |
| 119 | 943 | 2.4% | |
| 103 | 842 | 2.1% | |
| Other values (59) | 3651 | 9.2% |
| Value | Count | Frequency (%) | |
| 0 | 458 | 1.2% | |
| 2 | 1 | < 0.1% | |
| 6 | 213 | 0.5% | |
| 10 | 1 | < 0.1% | |
| 12 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 142 | 4557 | 11.5% | |
| 140 | 108 | 0.3% | |
| 139 | 13 | < 0.1% | |
| 138 | 227 | 0.6% | |
| 136 | 101 | 0.3% |
| Distinct count | 36 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.44789858899872 |
|---|---|
| Minimum | 0 |
| Maximum | 50 |
| Zeros | 1680 |
| Zeros (%) | 4.2% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 23 |
| Q1 | 34 |
| median | 34 |
| Q3 | 34 |
| 95-th percentile | 46 |
| Maximum | 50 |
| Range | 50 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 8.357811091 |
|---|---|
| Coefficient of variation (CV) | 0.2498755211 |
| Kurtosis | 8.811395375 |
| Mean | 33.44789859 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.54436585 |
| Sum | 1329855 |
| Variance | 69.85300624 |
| Value | Count | Frequency (%) | |
| 34 | 31646 | 79.6% | |
| 43 | 2504 | 6.3% | |
| 0 | 1680 | 4.2% | |
| 46 | 1079 | 2.7% | |
| 23 | 1063 | 2.7% | |
| 48 | 864 | 2.2% | |
| 36 | 307 | 0.8% | |
| 50 | 217 | 0.5% | |
| 9 | 170 | 0.4% | |
| 39 | 82 | 0.2% | |
| Other values (26) | 147 | 0.4% |
| Value | Count | Frequency (%) | |
| 0 | 1680 | 4.2% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 50 | 217 | 0.5% | |
| 48 | 864 | 2.2% | |
| 47 | 1 | < 0.1% | |
| 46 | 1079 | 2.7% | |
| 43 | 2504 | 6.3% |
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 15903 |
| Missing (%) | 40.0% |
| Memory size | 310.6 KiB |
| 1 | |
|---|---|
| 0 | 1068 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 1 | 22788 | 57.3% | |
| 0 | 1068 | 2.7% | |
| (Missing) | 15903 | 40.0% |
IS_TEST_DATA
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 23856 | 60.0% | |
| 1 | 15903 | 40.0% |
YEAR
Real number (ℝ≥0)
| Distinct count | 28 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2004.2260871752308 |
|---|---|
| Minimum | 1991 |
| Maximum | 2018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1991 |
|---|---|
| 5-th percentile | 1992 |
| Q1 | 1998 |
| median | 2004 |
| Q3 | 2011 |
| 95-th percentile | 2017 |
| Maximum | 2018 |
| Range | 27 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 7.788343872 |
|---|---|
| Coefficient of variation (CV) | 0.003885960732 |
| Kurtosis | -1.114256375 |
| Mean | 2004.226087 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.110890986 |
| Sum | 79686025 |
| Variance | 60.65830028 |
| Value | Count | Frequency (%) | |
| 2001 | 1924 | 4.8% | |
| 1996 | 1752 | 4.4% | |
| 2000 | 1685 | 4.2% | |
| 1997 | 1617 | 4.1% | |
| 2008 | 1602 | 4.0% | |
| 2006 | 1561 | 3.9% | |
| 2007 | 1551 | 3.9% | |
| 1998 | 1536 | 3.9% | |
| 1993 | 1532 | 3.9% | |
| 2004 | 1521 | 3.8% | |
| Other values (18) | 23478 | 59.1% |
| Value | Count | Frequency (%) | |
| 1991 | 879 | 2.2% | |
| 1992 | 1303 | 3.3% | |
| 1993 | 1532 | 3.9% | |
| 1994 | 1192 | 3.0% | |
| 1995 | 1490 | 3.7% |
| Value | Count | Frequency (%) | |
| 2018 | 1367 | 3.4% | |
| 2017 | 1470 | 3.7% | |
| 2016 | 1214 | 3.1% | |
| 2015 | 1182 | 3.0% | |
| 2014 | 1154 | 2.9% |
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.525591689931839 |
|---|---|
| Minimum | 1 |
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.289973788 |
|---|---|
| Coefficient of variation (CV) | 0.5041648243 |
| Kurtosis | -1.138800574 |
| Mean | 6.52559169 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.03173595572 |
| Sum | 259451 |
| Variance | 10.82392752 |
| Value | Count | Frequency (%) | |
| 9 | 3770 | 9.5% | |
| 10 | 3698 | 9.3% | |
| 7 | 3595 | 9.0% | |
| 5 | 3566 | 9.0% | |
| 4 | 3524 | 8.9% | |
| 8 | 3501 | 8.8% | |
| 6 | 3497 | 8.8% | |
| 3 | 3385 | 8.5% | |
| 11 | 3063 | 7.7% | |
| 2 | 2853 | 7.2% | |
| Other values (2) | 5307 | 13.3% |
| Value | Count | Frequency (%) | |
| 1 | 2798 | 7.0% | |
| 2 | 2853 | 7.2% | |
| 3 | 3385 | 8.5% | |
| 4 | 3524 | 8.9% | |
| 5 | 3566 | 9.0% |
| Value | Count | Frequency (%) | |
| 12 | 2509 | 6.3% | |
| 11 | 3063 | 7.7% | |
| 10 | 3698 | 9.3% | |
| 9 | 3770 | 9.5% | |
| 8 | 3501 | 8.8% |
DAY
Real number (ℝ≥0)
| Distinct count | 31 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.543776251917805 |
|---|---|
| Minimum | 1 |
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 15 |
| Q3 | 23 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.793849914 |
|---|---|
| Coefficient of variation (CV) | 0.5657473301 |
| Kurtosis | -1.175693136 |
| Mean | 15.54377625 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.01609533896 |
| Sum | 618005 |
| Variance | 77.3317963 |
| Value | Count | Frequency (%) | |
| 1 | 1611 | 4.1% | |
| 15 | 1418 | 3.6% | |
| 7 | 1381 | 3.5% | |
| 12 | 1358 | 3.4% | |
| 13 | 1356 | 3.4% | |
| 20 | 1351 | 3.4% | |
| 2 | 1347 | 3.4% | |
| 9 | 1339 | 3.4% | |
| 14 | 1338 | 3.4% | |
| 18 | 1329 | 3.3% | |
| Other values (21) | 25931 | 65.2% |
| Value | Count | Frequency (%) | |
| 1 | 1611 | 4.1% | |
| 2 | 1347 | 3.4% | |
| 3 | 1259 | 3.2% | |
| 4 | 1241 | 3.1% | |
| 5 | 1257 | 3.2% |
| Value | Count | Frequency (%) | |
| 31 | 692 | 1.7% | |
| 30 | 1181 | 3.0% | |
| 29 | 1183 | 3.0% | |
| 28 | 1248 | 3.1% | |
| 27 | 1270 | 3.2% |
IS_WEEKEND
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 28702 | 72.2% | |
| 1 | 11057 | 27.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | INCIDENT_ID | X_2 | X_4 | X_5 | X_6 | X_7 | X_8 | X_9 | X_10 | X_11 | X_12 | X_13 | X_14 | X_15 | MULTIPLE_OFFENSE | IS_TEST_DATA | YEAR | MONTH | DAY | IS_WEEKEND | QUARTER_OF_YEAR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | CR_102659 | 36 | 2 | 1 | 5 | 6 | 1 | 6 | 1 | 174 | 1.0 | 92 | 29 | 36 | 0.0 | 0 | 2004 | 7 | 4 | 1 | 3 |
| 1 | 1 | CR_189752 | 37 | 0 | 0 | 11 | 17 | 1 | 6 | 1 | 236 | 1.0 | 103 | 142 | 34 | 1.0 | 0 | 2017 | 7 | 18 | 0 | 3 |
| 2 | 2 | CR_184637 | 3 | 3 | 5 | 1 | 0 | 2 | 3 | 1 | 174 | 1.0 | 110 | 93 | 34 | 1.0 | 0 | 2017 | 3 | 15 | 0 | 1 |
| 3 | 3 | CR_139071 | 33 | 2 | 1 | 7 | 1 | 1 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | 1.0 | 0 | 2009 | 2 | 13 | 0 | 1 |
| 4 | 4 | CR_109335 | 33 | 2 | 1 | 8 | 3 | 0 | 5 | 1 | 174 | 0.0 | 112 | 29 | 43 | 1.0 | 0 | 2005 | 4 | 13 | 0 | 2 |
| 5 | 5 | CR_96263 | 45 | 10 | 3 | 1 | 0 | 1 | 6 | 1 | 303 | 1.0 | 72 | 62 | 34 | 1.0 | 0 | 2003 | 4 | 7 | 0 | 2 |
| 6 | 6 | CR_131400 | 30 | 7 | 3 | 7 | 1 | 0 | 5 | 1 | 174 | 0.0 | 112 | 29 | 43 | 1.0 | 0 | 2008 | 1 | 22 | 0 | 1 |
| 7 | 7 | CR_11981 | 8 | 7 | 3 | 9 | 8 | 0 | 5 | 1 | 316 | 1.0 | 72 | 62 | 34 | 1.0 | 0 | 1993 | 5 | 14 | 0 | 2 |
| 8 | 8 | CR_184134 | 49 | 6 | 5 | 8 | 3 | 1 | 1 | 1 | 316 | 1.0 | 103 | 14 | 34 | 1.0 | 0 | 2016 | 8 | 21 | 1 | 3 |
| 9 | 9 | CR_32634 | 4 | 6 | 5 | 15 | 10 | 0 | 5 | 2 | 145 | 1.0 | 103 | 29 | 34 | 0.0 | 0 | 1996 | 8 | 25 | 1 | 3 |
Last rows
| df_index | INCIDENT_ID | X_2 | X_4 | X_5 | X_6 | X_7 | X_8 | X_9 | X_10 | X_11 | X_12 | X_13 | X_14 | X_15 | MULTIPLE_OFFENSE | IS_TEST_DATA | YEAR | MONTH | DAY | IS_WEEKEND | QUARTER_OF_YEAR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 39749 | 15893 | CR_148375 | 3 | 3 | 5 | 1 | 0 | 0 | 5 | 2 | 249 | 2.0 | 103 | 80 | 34 | NaN | 1 | 2011 | 10 | 22 | 1 | 4 |
| 39750 | 15894 | CR_67736 | 21 | 4 | 1 | 6 | 4 | 0 | 5 | 1 | 174 | 1.0 | 98 | 93 | 34 | NaN | 1 | 2000 | 6 | 29 | 0 | 2 |
| 39751 | 15895 | CR_185890 | 5 | 3 | 5 | 8 | 3 | 1 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | NaN | 1 | 2017 | 6 | 29 | 0 | 2 |
| 39752 | 15896 | CR_89868 | 3 | 3 | 5 | 1 | 0 | 1 | 6 | 1 | 0 | 1.0 | 72 | 29 | 34 | NaN | 1 | 2003 | 5 | 11 | 1 | 2 |
| 39753 | 15897 | CR_148343 | 3 | 3 | 5 | 1 | 0 | 3 | 6 | 1 | 303 | 1.0 | 72 | 29 | 34 | NaN | 1 | 2011 | 9 | 1 | 0 | 3 |
| 39754 | 15898 | CR_44468 | 22 | 7 | 3 | 15 | 10 | 0 | 5 | 1 | 174 | 0.0 | 72 | 29 | 43 | NaN | 1 | 1997 | 11 | 28 | 0 | 4 |
| 39755 | 15899 | CR_158460 | 35 | 3 | 5 | 1 | 0 | 2 | 3 | 2 | 0 | 2.0 | 72 | 93 | 34 | NaN | 1 | 2012 | 6 | 9 | 1 | 2 |
| 39756 | 15900 | CR_115946 | 26 | 9 | 0 | 6 | 4 | 2 | 6 | 1 | 0 | 1.0 | 72 | 62 | 34 | NaN | 1 | 2006 | 4 | 22 | 1 | 2 |
| 39757 | 15901 | CR_137663 | 21 | 4 | 1 | 2 | 7 | 1 | 6 | 2 | 249 | 2.0 | 92 | 62 | 34 | NaN | 1 | 2009 | 4 | 3 | 0 | 2 |
| 39758 | 15902 | CR_33545 | 4 | 6 | 5 | 4 | 2 | 5 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | NaN | 1 | 1996 | 4 | 24 | 0 | 2 |